A Hebrew Tree Bank Based on Cantillation Marks
نویسندگان
چکیده
In the Masoretic text of the Hebrew Bible (HB), the cantillation marks function like a punctuation system that shows the division and subdivision of each verse, forming a tree structure which is similar to the prosodic tree in modern linguistics. However, in the Masoretic text, the structure is hidden in a complicated set of diacritic symbols and the rich information is accessible only to a few trained scholars. In order to make the structural information available to the general public and to automatic processing by the computer, we built a tree bank where the hierarchical structure of each HB verse is explicitly represented in XML format. We coded the punctuation system in a context-tree grammar which was then used by a CYK parser to automatically generate trees for the whole HB. The results show that (1) the CFG correctly encoded the annotation rules and (2) the annotation done by the Masoretes is highly
منابع مشابه
From Prosodic Trees to Syntactic Trees
This paper describes an ongoing effort to parse the Hebrew Bible. The parser consults the bracketing information extracted from the cantillation marks of the Masoetic text. We first constructed a cantillation treebank which encodes the prosodic structures of the text. It was found that many of the prosodic boundaries in the cantillation trees correspond, directly or indirectly, to the phrase bo...
متن کاملLanguage Support A Simple Technique for Typesetting Hebrew with Vowel Points
This paper describes a simple mechanism for typesetting Hebrew with vowel points. Hebrew uses a large set of accents that represent vowels, consonant modifiers, and cantillation instructions. These accents are placed above, below, or inside letters; a single letter can carry several accents. The solution that we describe, which is designed for PostScript [2] output devices, leaves the placement...
متن کاملBUILDING A HEBREW TREE-BANK Building a Tree-Bank of Modern Hebrew Text
This paper describes the process of building the first tree-bank for Modern Hebrew texts. A major concern in this process is the need for reducing the cost of manual annotation by the use of automatic means. To this end, the joint utility of an automatic morphological analyzer, a probabilistic parser and a small manually annotated tree-bank was explored. An initial tree-bank that consists of 50...
متن کاملBuilding a Tree-Bank of Modern Hebrew Text
This paper describes the process of building the first tree-bank for Modern Hebrew texts. A major concern in this process is the need for reducing the cost of manual annotation by the use of automatic means. To this end, the joint utility of an automatic morphological analyzer, a probabilistic parser and a small manually annotated tree-bank was explored. An initial tree-bank that consists of 50...
متن کاملVowel reduction in Modern Hebrew: Traces of the past and current variation
The aim of this paper was to find out the scope and boundaries of a-reduction in Modern Hebrew. In Classical Hebrew, vowel reduction was a regular, obligatory process. In Modern Hebrew, it has restricted scope and operates under opaque conditions. The only reliable trace of the historical motivation for the rule is the Hebrew vocalization system (nikud). 100 participants in four age groups were...
متن کامل